20 research outputs found

    Explaining the Decisions of Deep Policy Networks for Robotic Manipulations

    Full text link
    Deep policy networks enable robots to learn behaviors to solve various real-world complex tasks in an end-to-end fashion. However, they lack transparency to provide the reasons of actions. Thus, such a black-box model often results in low reliability and disruptive actions during the deployment of the robot in practice. To enhance its transparency, it is important to explain robot behaviors by considering the extent to which each input feature contributes to determining a given action. In this paper, we present an explicit analysis of deep policy models through input attribution methods to explain how and to what extent each input feature affects the decisions of the robot policy models. To this end, we present two methods for applying input attribution methods to robot policy networks: (1) we measure the importance factor of each joint torque to reflect the influence of the motor torque on the end-effector movement, and (2) we modify a relevance propagation method to handle negative inputs and outputs in deep policy networks properly. To the best of our knowledge, this is the first report to identify the dynamic changes of input attributions of multi-modal sensor inputs in deep policy networks online for robotic manipulation.Comment: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021

    Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills

    Full text link
    Mutual information-based reinforcement learning (RL) has been proposed as a promising framework for retrieving complex skills autonomously without a task-oriented reward function through mutual information (MI) maximization or variational empowerment. However, learning complex skills is still challenging, due to the fact that the order of training skills can largely affect sample efficiency. Inspired by this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.Comment: ICML 2023. First two authors contributed equally. Code at https://github.com/seongun-kim/vcr

    Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans

    Full text link
    Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.Comment: NeurIPS 2023. First two authors contributed equally. Code at http://github.com/leekwoon/rg

    Adaptive and Explainable Deployment of Navigation Skills via Hierarchical Deep Reinforcement Learning

    Full text link
    For robotic vehicles to navigate robustly and safely in unseen environments, it is crucial to decide the most suitable navigation policy. However, most existing deep reinforcement learning based navigation policies are trained with a hand-engineered curriculum and reward function which are difficult to be deployed in a wide range of real-world scenarios. In this paper, we propose a framework to learn a family of low-level navigation policies and a high-level policy for deploying them. The main idea is that, instead of learning a single navigation policy with a fixed reward function, we simultaneously learn a family of policies that exhibit different behaviors with a wide range of reward functions. We then train the high-level policy which adaptively deploys the most suitable navigation skill. We evaluate our approach in simulation and the real world and demonstrate that our method can learn diverse navigation skills and adaptively deploy them. We also illustrate that our proposed hierarchical learning framework presents explainability by providing semantics for the behavior of an autonomous agent.Comment: ICRA 2023. First two authors contributed equally. Code at https://github.com/leekwoon/hrl-na

    Contractile force is enhanced in Aortas from pendrin null mice due to stimulation of angiotensin II-dependent signaling.

    Get PDF
    Pendrin is a Cl-/HCO3- exchanger expressed in the apical regions of renal intercalated cells. Following pendrin gene ablation, blood pressure falls, in part, from reduced renal NaCl absorption. We asked if pendrin is expressed in vascular tissue and if the lower blood pressure observed in pendrin null mice is accompanied by reduced vascular reactivity. Thus, the contractile responses to KCl and phenylephrine (PE) were examined in isometrically mounted thoracic aortas from wild-type and pendrin null mice. Although pendrin expression was not detected in the aorta, pendrin gene ablation changed contractile protein abundance and increased the maximal contractile response to PE when normalized to cross sectional area (CSA). However, the contractile sensitivity to this agent was unchanged. The increase in contractile force/cross sectional area observed in pendrin null mice was due to reduced cross sectional area of the aorta and not from increased contractile force per vessel. The pendrin-dependent increase in maximal contractile response was endothelium- and nitric oxide-independent and did not occur from changes in Ca2+ sensitivity or chronic changes in catecholamine production. However, application of 100 nM angiotensin II increased force/CSA more in aortas from pendrin null than from wild type mice. Moreover, angiotensin type 1 receptor inhibitor (candesartan) treatment in vivo eliminated the pendrin-dependent changes contractile protein abundance and changes in the contractile force/cross sectional area in response to PE. In conclusion, pendrin gene ablation increases aorta contractile force per cross sectional area in response to angiotensin II and PE due to stimulation of angiotensin type 1 receptor-dependent signaling. The angiotensin type 1 receptor-dependent increase in vascular reactivity may mitigate the fall in blood pressure observed with pendrin gene ablation

    Pendrin Modulates ENaC Function by Changing Luminal HCO3−

    No full text
    The epithelial Na+ channel, ENaC, and the Cl−/HCO3− exchanger, pendrin, mediate NaCl absorption within the cortical collecting duct and the connecting tubule. Although pendrin and ENaC localize to different cell types, ENaC subunit abundance and activity are lower in aldosterone-treated pendrin-null mice relative to wild-type mice. Because pendrin mediates HCO3− secretion, we asked if increasing distal delivery of HCO3− through a pendrin-independent mechanism “rescues” ENaC function in pendrin-null mice. We gave aldosterone and NaHCO3 to increase pendrin-dependent HCO3− secretion within the connecting tubule and cortical collecting duct, or gave aldosterone and NaHCO3 plus acetazolamide to increase luminal HCO3− concentration, [HCO3−], independent of pendrin. Following treatment with aldosterone and NaHCO3, pendrin-null mice had lower urinary pH and [HCO3−] as well as lower renal ENaC abundance and function than wild-type mice. With the addition of acetazolamide, however, acid-base balance as well as ENaC subunit abundance and function was similar in pendrin-null and wild-type mice. We explored whether [HCO3−] directly alters ENaC abundance and function in cultured mouse principal cells (mpkCCD). Amiloride-sensitive current and ENaC abundance rose with increased [HCO3−] on the apical or the basolateral side, independent of the substituting anion. However, ENaC was more sensitive to changes in [HCO3−] on the basolateral side of the monolayer. Moreover, increasing [HCO3−] on the apical and basolateral side of Xenopus kidney cells increased both ENaC channel density and channel activity. We conclude that pendrin modulates ENaC abundance and function, at least in part by increasing luminal [HCO3−] and/or pH

    Force/CSA in response to Angiotensin II in isolated mouse thoracic aorta from wild type and pendrin null mice.

    No full text
    <p>Force/CSA was measured in response to angiotensin II (100 nM) in thoracic aortas from pendrin null (KO) and wild type (WT) mice (Panel A). Panel B shows Force/CSA in response to 100 nM angiotensin II when expressed as the percentage of force/CSA observed in response to phenylephrine (10 µM). n = 7 in each group. *p<0.05.</p
    corecore